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[57] ABSTRACT 

A method for reducing noise in a speech signal by control- 
ling suppression of a predetermined band when an input 
speech signal has a large pitch strength. The noise reduction 
method is to be used in an apparatus having a signal 
characteristic calculating unit, an adjustment calculating unit 
32, a consonant component valve (CE) and relative noise 
level value calculating unit, a prefilter or Hn value calcu- 
lating unit, and a spectrum correcting unit as main compo- 
nents. The signal characteristic calculating unit derives a 
pitch strength of the input speech signal. The adjustment 
calculating unit derives an adjustment value according to the 
pitch strength. The CE and NR value calculating unit derives 
an NR value according to the pitch strength. Then, the Hn 
value calculating unit derives the Hn value according to the 
NR value and sets a noise suppression rate of the input 
speech signal. The spectrum correcting unit 10 reduces the 
noise of the input speech signal based on the noise suppres- 
sion rate. 

8 Claims, 13 Drawing Sheets 
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METHOD BASED ON PITCH-STRENGTH However, the calculation of the probability of speech 

FOR REDUCING NOISE IN occurrence needs a complicated operation as well as an 

PREDETERMINED SUBBANDS OF A enormous amount of operations. Hence, it has been desirable 

SPEECH SIGNAL to simplify the calculation. 

„ 5 For example, consider that the speech signal is processed 

BACKGROUND OF THE INVENTION by ^ noise P ^ ciag apparatus ^ d , hen g ^ ^ lQ (he 

1. Field of the Invention apparatus for encoding the speech signal. Since the appara- 
Tlie present invention relates to a method for reducing ms en f d *« the s P eech si S nal P rovides a ^gh-pass filter 

noise in speech signals by supplying a speech signal to a in ^ter for boosting a high-pass region of the signal, if the 
speech encoding apparatus having a filter for suppressing a 30 noise reducin S a PP aratus has already suppressed the low- 
predetermined frequency band of a speech signal to be input P 355 of the filter ' ^ Waratus for encoding the 
to the apparatus itself. s P eech sl S nal °P erates t0 suppress the low-pass 
-.t^ r.u n i , j a region of the signal, thereby possibly changing the fre- 

2. Description of the Related Art u . ■ j • - „ 

v quency characteristics and reproducing an acoustically 

In the applied field of a portable phone or speech 15 unnatural voice, 
recognition, it has been required to suppress noises such as The ^^0^ method for redud the noise ^ 
circumstance noise and background noise contained in a repioduce an acoustically unnatural voice, because the pro- 
recorded speech signal, thereby enhancing voice compo- cess for redud the floise ^ execuled m Qn ^ £ of 
nents of the recorded speech signal. me ^ speech ^ ^ ^ a ^ stfength ^ s ^ ply 

As one technique for enhancing speech or reducing noise, 20 0 n the estimated noise level, 

the arrangement with a conditional probability function for For derivi the itch ^ a melhod faas been 

adjusting a decay factor is disclosed in Speech Enhance- for deriyi a itcn { betweeQ ^ ^. ^ Qf a ^ 

w a V Smg K^ S T oft ^ e ^ lon No«« Suppression Filter , R. J. waveform and then an aut0 correlated value in the pitch lag. 

McAulay, M. L. Malpass, IEEE Trans. Acoust,, Speech, ^ method hQ ^ ^ autocorrelation fanctio ^ 

Signal Processing, Vol.28 pp.137 to 145, Apnl 1980 or * ^ in a ^ M n£eds tQ com . 

Frequency Domain Noise Supp^n i^pioach in MobUe te a term of (N1 ^ and further calculate a va]ue of N 

Telephone Systems J.Yang, IEEE ICASSP, Vol.II, pp.363 He this ^ a Ucated tion> 

to 366, April 1993, for example. v * 

These techniques for suppressing noise, however, may SUMMARY OF THE INVENTION 

generate an unnatural tone and a distorted speech because of 30 . t . ... 

an inappropriate fixed SNR (signal-to-noise ratio) or an . In ^ of foregoing it is an object of the present 

inappropriate suppressing filter. In the practical use, it is not invention to provide a method for reducing noise in a speech 

desirable for users to adjust the SNR that is one of the signal which method makes it possible to simplify the 

parameters used in a noise suppressing apparatus for maxi- °P««ions for suppressing the noise in an input speech 

mizing the performance. Moreover, the conventional tech- 35 sl S Da • 

nique for enhancing a speech signal cannot fully remove 11 ^ mother object of the present invention to provide a 

noise without by-producing the distortion of the speech method for reducing noise in a speech signal which method 

signals susceptible to considerable fluctuations in the short- makes it possible to suppress a predetermined band when the 

term S/N ratio. m P ut speech signal has a large pitch strength. 

With the above-described speech enhancement or noise 40 According to an aspect of the invention, a method for 

reducing method, the technique of detecting the noise reducing noise in a speech signal for supplying a speech 

domain is employed, in which the input level or power is signal to a speech encoding apparatus having a filter for 

compared to a pre -set threshold for discriminating the noise suppressing a predetermined frequency of the input speech 

domain. However, if the time constant of the threshold value signal, includes the step of controlling a frequency charac- 

is increased for preventing tracking to the speech, it becomes 45 teristic so that the noise suppression rate in the predeter- 

impossible to follow noise level changes, especially with mined frequency band is made smaller, 

increases in the noise level, thus leading to mistaken dis- The filter provided in the speech encoding apparatus is 

crimination. arranged to change the noise suppression rate according to 

To solve the foregoing problems, the present inventors 50 the pitch strength of the input speech signal so that the noise 

have proposed a method for reducing noise in a speech suppression rate may be changed according to the pitch 

signal in the Japanese Patent Application No. Hei 6-99869 strength of the input speech signal. 

(EP 683 482 A2). The predetermined frequency band is located on the 

The foregoing method for reducing the noise in a speech low-pass side of the speech signal. The noise suppression 

signal is arranged to suppress the noise by adaptively 55 rate is changed so as to reduce the noise suppressing rate on 

controlling a maximum likelihood filter adapted for calcu- the low-pass side of the input speech signal, 

lating speech components based on the speech presence According to another aspect of the invention, the noise 

probability and the SN ratio calculated on the input speech reducing method for supplying a speech signal to the speech 

signal. Specifically, the spectral difference, that is, the spec- encoding apparatus having a filter for suppressing a prede- 

trum of an input signal less an estimated noise spectrum, is go termined frequency band of the input speech signal includes 

employed in calculating the probability of speech occur- the step of changing a noise suppression characteristic to a 

rence. ratio of a signal level to a noise level in each frequency band 

Further, the foregoing method for reducing the noise in a when suppressing the noise according to the pitch strength 

speech signal makes it possible to fully remove the noise of the input speech signal. 

from the input speech signal, because the maximum likeli- 65 According to another aspect of the invention, a noise 

hood filter is adjusted to the most appropriate filter accord- reducing method for supplying a speech signal to the speech 

ing to the SN ratio of the input speech signal. encoding apparatus having a filter for suppressing a prede- 
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termined frequency band of the input voice signal includes 
the step of inputting each of the parameters for determining 
the noise suppression characteristic to a neural network for 
discriminating a speech domain from a noise domain of the 
input speech signal. 

According to another aspect of the invention, a noise 
reducing method for supplying a speech signal to the speech 
encoding apparatus having a filter for suppressing a prede- 
termined frequency band of the input speech signal includes 
the step of substantially linearly changing in a dB domain a 
maximum noise suppression rate processed on the charac- 
teristic appearing when suppressing the noise. 

According to another aspect of the invention, a noise 
reducing method for supplying a speech signal to the speech 
encoding apparatus having a filter for suppressing a prede- 
termined frequency band of the input speech signal, includes 
the step of obtaining a pitch strength of the input speech 
signal by calculating an autocorrelation nearby a pitch 
obtained by selecting a peak of the signal level. The char- 
acteristic used in suppressing the noise is controlled on the 
pitch strength. 

According to another aspect of the invention, a noise 
reducing method for supplying a speech signal to the voice 
encoding apparatus having a filter for suppressing a prede- 
termined frequency band of the input speech signal, includes 
the step of processing the framed speech signal indepen- 
dently through the effect of a frame for deriving parameters 
indicating the feature of the speech signal and in a frame for 
correcting a spectrum by using the derived parameters. 

In operation, with the method for reducing the noise in a 
speech signal according to the invention, the speech signal 
is supplied to the speech encoding apparatus having a filter 
for suppressing the predetermined band of the input speech 
signal by controlling the characteristic of the filter used for 
reducing the noise and reducing the noise suppression rate in 
the predetermined frequency band of the input speech sig- 
nal. 

If the speech encoding apparatus has a filter for suppress- 
ing a low-pass side of the speech signal, the noise suppres- 
sion rate is controlled so that the noise suppression rate is 
made smaller on the low-pass side of the input speech signal. 

With the method for reducing the noise in a speech signal 
according to the present invention, a pitch of the input 
speech signal is detected for obtaining a strength of the 
detected pitch. The frequency characteristic used in sup- 
pressing the noise is controlled according to the obtained 
pitch strength. 

With the method for reducing the noise in a speech signal 
according to the present invention, when each of the param- 
eters for determining a frequency characteristic used in 
suppressing the noise is input to the neural network, the 
speech domain is discriminated from the noise domain in the 
input speech signal. This discrimination is made more 
precise with increase of the processing times. 

With the method for reducing the noise in a speech signal 
according to the present invention, the pitch strength of the 
input speech signal is obtained as follows. Two peaks are 
selected within one period and an autocorrelated value in 
each peak and a cross-correlated value between the peaks 
are derived. The pitch strength is calculated on the autocor- 
related value and the cross-correlated value. The frequency 
characteristic used in suppressing the noise is controlled 
according to the pitch strength. 

With the method for reducing the noise in a speech signal 
according to the present invention, the framing process of 
the input speech signal is executed independently through 
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the effect of a frame for correcting a spectrum and a frame 
for deriving a parameter indicating the feature of the speech 
signal. For example, the framing process for deriving the 
parameter takes more samples than the framing process for 

s correcting the spectrum. 

As described above, with the method for reducing the 
noise in a speech signal according to the present invention, 
the characteristic of the filter used for reducing the noise is 
controlled according to the pitch strength of the input speech 

10 signal. And, the predetermined frequency band of the input 
speech signal such as the noise suppression rate is controlled 
to be smaller on the high-pass side or the low-pass side. With 
this control, if the speech signal processed on the noise 
suppression rate is encoded as a speech signal, no acousti- 
cally unnatural voice may be reproduced from the speech 

15 signal. That is, the tone quality is enhanced. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram showing an essential part of a 
noise reducing apparatus to which a noise reducing method 
20 in a speech signal according to the invention is applied; 

FIG. 2 is an explanatory view showing a framing process 
executed in a framing unit provided in the noise reducing 
apparatus; 

FIG. 3 is an explanatory view showing a pitch detecting 
25 process executed in a signal characteristic calculating unit 
provided in the noise reducing apparatus; 

FIG. 4 is a graph showing concrete values of energy E[k] 
and decay energy E^ ccy [k] in the noise reducing apparatus; 
FIG. 5 is a graph showing concrete values of a RMS value 
30 RMS [k], an estimated noise level value MinRMS [k], and 
a maximum RMS value MaxRMS [k] used in the noise 
reducing apparatus; 

FIG. 6 is a graph showing concrete values of a relative 
energy dB,,,, [k], a maximum SN ratio MaxSNR [k], one 
35 threshold value dBthres re/ [k] for determining the noise, all 
represented in dB, used in the noise reducing apparatus; 

FIG. 7 is a graph showing a function of NR_level [k] 
defined for a maximum SN ratio MaxSNR [k] in the noise 
reducing apparatus; 
40 FIGS. 8 A to 8B are graphs showing a relation between a 
value of adj3 [w, k] obtained in an adjustment value calcu- 
lating unit and a frequency in the noise reducing apparatus; 

FIG. 9 is an explanatory view showing a method for 
obtaining a value indicating a distribution of a frequency 
45 area of an input signal spectrum in the noise reducing 
apparatus; 

FIG. 10 is a graph showing a relation between a value of 
NR [w, k] obtained in a CE and NR value calculating unit 
and a maximum suppressing amount obtained in a Hn value 
50 calculating unit provided in the noise reducing apparatus; 
FIG. 11 is a block diagram showing an essential portion 
of a conventional encoding apparatus operated on an algo- 
rithm for encoding a predictive linear code excitation that is 
an example of using the output of the noise reducing 
55 apparatus; 

FIG. 12 is a block diagram showing an essential portion 
of a conventional decoding unit for decoding an encoded 
speech signal provided in the encoding apparatus; and 

FIG, 13 is a view showing estimation of a noise domain 
60 in the method for reducing a speech signal according to an 
embodiment of the present invention. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 

65 Later, the description will be oriented to a method for 
reducing noise in a speech signal according to the present 
invention with reference to the drawings. 
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FIG. 1 shows a noise reducing apparatus to which the windowing by a windowing function w outpur Examples of 

method for reducing the noise in a speech signal according the windowing functions w ffywif and w^^, are given by the 

to the present invention is applied. following equations (1) and (2). 

The noise reducing apparatus includes a noise suppression 

filter characteristic generating section 35 and a spectrum 5 -i. (l) 

correcting unit 10. The generating section 35 operates to set w^tf] - ( -i- - -i. cos ( 2 ) ) o^j^Fi 

a noise suppression rate to an input speech signal applied to * \ / / 

an input terminal 13 for a speech signal. The spectrum — (2) 

correcting unit 10 operates to reduce the noise in the input w^JU] ~(- - • cos ( 2 '*^ \ \ * o s / g fi 

speech signal based on the noise suppression rate as will be 10 ^22 \ fl ) ) 

described below. The speech signal output at an output Next, the fast Fourier transforming unit 3 performs the 

terminal 14 for the speech signal is sent to an encoding fast fourier transform at 256 points with respect to the 

apparatus that is operated on an algorithm for encoding a frame-based signal y-framel^ windowed by the windowing 

predictive linear excitation. function winput to produce frequency spectral amplitude 

In the noise reducing apparatus, an input speech signal 15 va l ues - The resulting frequency spectral amplitude values 

y[t] containing a speech component and a noise component are 0Ut P ut t0 . a frequency dividing unit 4 and a spectrum 

is supplied to the input terminal 13 for the speech signal. The correcting unit 10. 

input speech signal y[t] is a digital signal having a sampling The noise suppression filter characteristic generating sec- 
frequency of FS. The signal y[t] is sent to a framing unit 21, tlon 35 * ^P 0 ^ of a si S na l characteristic calculating 
in which the signal is divided into frames of FL samples. 20 u ^31, and the adj value calculatmg imit32, the CE and 
Later, the signal is processed in each frame. valuc calculatin g umt 36 > and a Hn calculating unit 7. 

Hie framing unit 21 includes a first framing portion 22 to h^^" ^5! tl H 4 ° P T™ 

j , r . * mi_ e * . . lo divide an amplitude value of the frequency spectrum 

and a second framing portion l.TTie first framing portion 22 obtained by performing the fast Fourief 4 ^ 

operates to modify a spectrum. The second frammg portion respect t0 the input speecn signal out t from the fast 
1 operates to derive parameters indicating the feature of the Fourier transforming unit 3 into e.g., 18 bands. The ampli- 
speech signal. Both of the portions 22 and 1 are executed in tude Y[w, k] of each band in which a band number for 
an independent manner. The processed result of the second identifying each band is w is output to the signal character- 
framing portion 1 is sent to the noise suppression filter ^Hc calculating unit 31, a noise spectrum estimating unit 26 
characteristic generating section 35 as will be described and an initial filter response calculating unit 33. An example 
below. The processed signal is used for deriving the param- 0 f a frequency range used in dividing the frequency into 
eters indicating the signal characteristic of the input speech bands is shown below, 
signal. As will be described below, the processed result of 

the first framing portion 22 is sent to a spectrum correcting TABLE 1 

unit 10 for correcting the spectrum according to the noise 
suppression characteristic obtained on the parameter indi- 
cating the signal characteristic. 

As shown in FIG. 2A, the first framing portion 22 operates 
to divide the input speech signal into 168 samples, that is, 
the frame whose length FL is made up of 168 samples, pick 
up a k-th frame, as frame 1^ and then output it to a win- 
dowing unit 2. Each frame framel* obtained by the first 
framing portion 22 is picked at a period of 160 samples. The 
current frame is overlapped with the previous frame by eight 
samples. 

As shown in FIG. 2B, the second framing portion 1 
operates to divide the input speech signal into 200 samples, 
that is, the frame whose length FL is made up of 200 
samples, pick up a k-th frame as frame2^, and then output the 
frame to a signal characteristic calculating unit 31 and a 50 
filtering unit 8. Each frame frame2 Jt obtained by the second 
framing unit 1 is picked up at a period of 160 samples. The 
current frame is overlapped with the one previous frame These frequency bands are set on the basis of the fact that 
frame2* +1 by 8 samples and with the one subsequent frame the perceptive resolution of the human auditory system is 
frame2 Jt _ 1 by 40 samples. 55 i owere d towards the higher frequency side. As the ampli- 

Assuming that the sampling frequency FS is 8000 Hz, that hides of the respective ranges, the maximum FFT (Fast 
is, 8 kHz, the framing operation is executed at regular Fourier Transform) amplitudes in the respective frequency 
intervals of 20 ms, because both the first framing portion 22 ranges are employed. 

and the second framing portion 1 have a frame interval FI of The signal characteristic calculating unit 31 operates to 
160 samples. 60 calculate a RMS [k] that is a RMS value for each frame, a 

Turning to FIG. 1, prior to processing by a fast Fourier dB w/ [k] that is relative energy for each frame, a MinRMS 
transforming unit 3 that is the next orthogonal transform, the [k] that is an estimated noise level value for each frame, a 
windowing unit 2 performs the windowing operation by a MaxRMS [k] that is a maximum RMS value for each frame, 
windowing function w^, with respect to each frame signal and a MaxSNR [k] that is a maximum SNR value for each 
y-framel^ sent from the first framing unit 22. After inverse 65 frame from y-frame2 > output from the second framing 
fast Fourier transform at the final stage of signal processing portion 1 and Y[w, k] output from the frequency dividing 
of the frame-based signal, an output signal is processed by unit 4. 
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40 
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Band Number 


Frequency Ranges 


0 


0-125 Hz 


1 


125-250 Hz 


2 


250-375 Hz 


3 


375-563 Hz 


4 


563-750 Hz 


5 


750-938 Hz 


6 


938-1125 Hz 


7 


1125-1313 Hz 


8 


1313-1563 Hz 


9 


1563-1813 Hz 


10 


1813-2063 Hz 


11 


2063-2313 Hz 


12 


2313-2563 Hz 


13 


2563-2813 Hz 


14 


2813-3063 Hz 


15 


3063-3375 Hz 


16 


3375-3688 Hz 


17 


3688-4000 Hz 
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At first, the detection of the pitch and the calculation of 
the pitch strength will be described below. 

In detecting the pitch, as shown in FIG. 3, the strongest 
peak among the frames of the input speech signal y-frame^ 
is detected as a peak x[ml]. Within the period where tne 
peak x[ml] exists, the second strongest peak is detected as 
a peak x[m2]. ml and m2 are the values of the time t for the 
corresponding peaks. The distance of the pitch p is obtained 
as a distance Jml -m2| between the peaks x[ml] and x[m2]. 
As indicated in the expression (6), the maximum pitch 



estimating the background noise or the background noise 
level. This value has to be minimum among the previous five 
local minimums from the current point, that is, the values 
meeting the expression (12). 

(RMS[k]<0,6 MaxRMSfk]RMS[k]<4000 RMS[k]<RMS[k+ljRMS 
[k]<RMS[k-l] and RMS[k]<RMS[k-2D or (RMS[k]<Min- 
RMS) (12) 

The estimated noise level value Min RMS[k] is set so that 



strength max_Rxx of the pitch p can be obtained on the 10 ^ level vdue Min RMS[k] rises in the background speech 



basis of a cross-correlating value nrgO of the peak x[ml] 
with the peak x[m2] derived by the expressions (3) to (5), an 
autocorrelation value nrgl of the peak x[ml], and the 
autocorrelation value nrg2 of the peak x[m2]. 



b 

nrgO - 2 x[ml + Af] ■ x[m2 + A/1 
A/— a 

b 

nrgl m 2 jc[ml + Af] • jtTml + A/1 
Ar=-a 

b 

nrgl = 2 x[m2 + At] ■ x[m2 + Af] 
Ar=-a 



max - Rxx 



nrgO 
mBx(nrgl, nrg2) 



(6) 



In succession, the method for deriving each value will be 
described below. 

RMS[k] is a RMS value of the k-th frame frame2^, which 
is calculated by the following expression. 



RMS[k]t 



l 

FL 



FL-1 
■ 2 (y- 



frarne2/, k) 2 



The relative energy dB^^k] of the k-th frame frame2 Jt 
indicates the relative energy of the k-th frame associated 
with the decay energy from the previous frame frame2 t _ i . 
This relative energy dB w/ [k] in dB notation is calculated by 
the following expression (8). The energy value E[k] and the 
decay energy value B decay [k] in the expression (8) are 
derived by the following expressions (9) and (10). 



d»^A]-10-log l0 ^-5^5-^ 



FL 

£[*]- I (v- framed k) 2 



15 



(3) 



(4) 



20 



free noise. When the noise level is high, the rising rate is 
exponentially functional. When the noise level is low, a fixed 
rising rate is used for securing a larger rise. 

The concrete values of the RMS value RMS[k], the 
estimated noise level value Min RMS[k] and the maximum 
RMS value Max RMS[k] will be shown in FIG. 5. 

The maximum SN ratio Max SNR[k] of the k-th frame 
frame2 Jt is a value estimated by the following expression 
(13) on the Max RMS[k] and Min RMS[k]. 



(MzxRMS[k] \ 



(13) 



Further, a normalizing parameter^NR_level [k] in the 
2S orange from 0 to 1 indicating the relative noise level is'' 
' calculated from the maximum SN ratio value MaxSNR. The 
v NR_level [k] uses the following function. > 



30 



(7) 



ATK_Jevel[*] = 



MzxSNF[k\ - 30 
20 



■))■ 



(14) 



35 



(1 - 0.002 (MaxSNR[*] - 30) 2 ) 30 < Max^Wip] § 50 
0.0 MaxSNR[k] > 50 
1.0 otherwise 



(8) 



Next, the noise spectrum estimating unit 26 operates to 
distinguish the speech from the background noise based on 
the RMS[k], dbj>],jhe NR_level[k] fl the MIN RMS[k] 
and the Max SNR[k]. That is, if the following condition is 
40 met, the signal in the k-th frame is classified as being the 
background noise. The amplitude value indicated by the 
classified background noise is calculated as a mean esti- 
mated value N[w, k] of the noise spectrum. The value N is 
output to the initial filter response calculating unit 33. 



45 



(9) 



(10) 



In the expression (10), the decay time is assumed as 0.65 
second. 

The concrete values of the energy E[k] and the decay 
energy E rfecflty [k] will be shown in FIG. 4. 

The maximum RMS value MaxRMS[k] of the k-th frame 
frame2 Jt is the necessary value for estimating an estimated 
noise level value and a maximum SN ratio of each frame to 
be described below. The value is calculated by the following 
expression (11). In the expression (11), 0 is a decay constant. 
This constant is preferable to be a value at which the 
maximum RMS value is decayed by 1/e at a time of 3.2 
seconds, concretely, 8-0.993769. 

MaxRMS[k>max(4000^MS[kleMaxRMS[K-l]+<l-e)RMS 
[KD (11) 

The estimated noise level value MinRMS[k] of the k-th 
frame frame2^ is a minimum RMS value that is preferable to 



50 



60 



((RMS[t]<NoiseRMS rtw [kD or (dB^k]>dB rtra [kD) and (RMS 
[k]<RMS[k-l]+200) Where NoiseRMS rtw [k]=1.05+0.45-NR_ 
level[k>MinRMS[k] dB thm fk]-max(MaxSNR[k]-4.0, 
0.9-MaxSNR[kD (15) 

FIG. 6 shows the concrete values of the relative energy 
dB re/ [k] in dB notation found in the expression (15), the 
maximum SN ratio Max SNR[k], and the dBthres^ that is 
one of the threshold values for discriminating the noise. 

FIG. 7 shows NR_level[k] that is a function of the Max 
SNR[k] found in the expression (14). 

If the k-th frame is classified as being the background 
noise or the noise, the time mean estimated value N[w, k] of 
the noise spectrum is updated as shown in the following 
expression (16) by the amplitude Y[w, k] of the input signal 
spectrum of the current frame. In the value N[w, k], w 
denotes a band number for each of the frequency-divided 
bands. 



65 jv{h} A] - a • max(AlH} k - ll Y[w, *]) + 



(16) 
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( oi"s ) 



-continued 

(l - a) ■ 



min(JV[Hi * - 11 *lHi kj) 



If the k-th frame is classified as the speech, N[w, k] 
directly uses the value of N[w, k-1]. 

Next, on the RMS[k], the Min RMS[k] and the Max 
RMS[k], the adj value calculating unit 32 operates to cal- 
culate adj[w, k] by the expression (17) using adjl[k], adj2[k] 
and adj3[w, k] those of which will be described below. The 
adj [w, k] is output to the CE value and the NR value 
calculating unit 36. 



10 



otherwise 



f 0.2 w< 



10 

-continued 

0.2 w < 200Hz 

0 200 = w < 2375Hz 



200 Hz 
200Hz 



(20) 



15 



adj\w,k}=min{adjl[k\adf2{kj)-adj3[w,k] 



(17) 



Herein, the adjl[k] found in the expression (17) is a value 
that is effective in suppressing the noise suppressing opera- 
tion based on the filtering operation (to be described below) 
in a high SN ratio over all the bands. The adjl[k] is defined 
in the following expression (18). 



adjiW- 



1 MaxSNR[K] < 29 
MzxSNR[k] - 29 



(18) 



1 - 



14 



29 ^ Max£VR[K] < 43 



0 otherwise 



The adj2[k] found in the expression (17) is a value that is 
effective in suppressing the noise suppression rate based on 
the above-mentioned filtering operation with respect to a 
quite high or low noise level. The adjl[k] is defined by the 
following expression (19). 



In the expression (20), the maximum pitch strength max_ 
Rxx[t] is normalized by using the first maximum pitch 
strength max_Rxx[0]. The comparison of the input speech 
level with the noise level is executed by the values derived 
from the Min RMS[k] and the Max RMS[k]. 

The CE and NR value calculating unit 36 operates to 
obtain an NR value for controlling the filter characteristic 
and then output the NR value to the Hn value calculating 
unit 7. 

For example, NR[w, k] corresponding to the NR value is 
defined by the following expression (21). 



^Hjjfc] o (1.0 - CflJtD ■ NR\w,k] (21) 

25 AWlvs*]- (22) 

adj[w t k] NR[w,k - 1] - 6ah < adj[w,k\ < NR[w,k - 1] + b NR 
NR[w,k - 1] - b>fjtNR[w,k - 1] - bx R ^ adj\w,k] 
NR[w,k - 1] + b S R NR[w,k - 1] + b SR % adj[w,k] 

b.vR - 0.004 



30 



adj2[*]- 



0 M \nRMS[k] < 20 

MinRMSt*] - 20 
4T3 

1 60 % MinRMS[k] < 1000 

(MinRMS[k] - 1000) 
1 1000 
0.2 MinRMS[K 1 1800 



20 £ MinRMS[k] < 60 



1000 g MinRMS[k] < 1800 



The aaj3[w, k] found in the expression (17) is a value for 
controlling the suppressing amount of the noise on the 
low-pass or the high-pass side when the strength of the pitch 
p of the input speech signal as shown in FIG. 3, in particular, 
the maximum pitch strength max_Rxx is large. For 
example, if the pitch strength is larger than the predeter- 
mined value and the input speech signal level is larger than 
the noise level, the adj3[w, k] takes a predetermined value on 
the low-pass side as shown in FIG. 8A, changes linearly with 
the frequency w on the high-pass side and takes a value of 
0 in the other frequency bands. In the other hand, the adj3[w, 
k] takes a predetermined value on the low-pass side as 
shown in FIG. 8B and a value of 0 in the other frequency 
bands. 

As an example, the definition of the adj3[w, k] is indicated 
in the expression (20), 



> 0.55 and 



max - Rxx[i) 
max-/to[0] 

RMS[k] > 0.8 • M'wRMS[k] + 0.2 • MzxRMS[k 



NR* [w, k] in the expression (21) is obtained by the 
expression (22) using the adj[w, k] sent from the adj value 
35 calculating unit 32. 
(19) The CE and NR value calculating unit 36 also operates to 
calculate CE[k] used in the expression (21). The CE[k] is a 
value for representing consonant components contained in 
the amplitude Y[w, k] of the input signal spectrum. Those 
40 consonant components are detected for each frame. The 
concrete detection of the consonants will be described 
below. 

If the pitch strength is larger than the predetermined value 
and the input speech signal is larger than the noise level, that 
45 is, the condition indicated in the first portion of the expres- 
sion (20) is met, the CE[k] takes a value of 0.5, for example. 
If the condition is not met, the CE[k] takes a value defined 
by the below-described method. 

At first, a zero crossing is detected at a portion where a 
50 sign is inverted from positive to negative or vice verse 
between the continuous samples in the Y[w, k] or a portion 
where a sample having a value of 0 is located between the 
samples having the signs opposed to each other. The number 
of the zero crossings is detected at each frame. This value is 
55 used for the below-described process as a zero cross number 
ZCtk]. 

Next, a tone is detected. The tone means a value repre- 
senting a distribution of frequency components of the Y[w, 
k], for example, a ratio of t'/b* (otonefkj) of an average level 
60 t' of the input signal spectrum on the high-pass side to an 
average level b' of the input signal spectrum on the low-pass 
side as shown in FIG. 9. These values V and b* are the values 
t and b at which an error function ERR (fc, b, t) defined in 
the below-described expression (23) takes a minimum value. 
65 In the expression (23), NB denotes a number of bands. 
denotes a maximum value of Y[w, k] in the band w, and fc 
denotes a point at which the high-pass is separated from the 
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low-pass. Id FIG. 9, in the frequency fc, the average value indicates that the current frame is a frame whose signal level 

of Y[w, k] on the low-pass side takes a value of b. The is changed one frame later than change of the speech signal, 

average value of Y[w, k] on the high-pass side takes a value If the symbol C4.4 is held, it indicates that the number of the 

°f l * zero crossings is larger than the predetermined zero crossing 

5 number Zhigh, in this embodiment, 75 at the current frame. 

< 23 > If the symbol C4.5 is held, it indicates that the tone value is 

min Err(fc t b t i) - OWn-,*] - by + changed at the frame. If the symbol C4.6 is held, it indicates 

/c-2 . . . nb-3 M '" 0 that the current frame is a frame whose tone value is changed 

^ tdi one frame later than the change of the speech signal. If the 

NB _^ 10 symbol C4.7 is held, it indicates that the current frame is a 

J c+1 (JmJV.*] - 1) 2 frame whose tone value is changed two frames later than the 

c change of the speech signal. 

In the expression (25), the conditions that the frame 

Based on the RMS value and the number of zero crosses, con tains syllable components are as follows: meeting the 

the frame close to the frame at which the voiced speech is 15 condition of the bols C1 t0 (3 k m the tone[k] 

detected, that is, speech proximity frame is detected. The la thaQ a6 and 

meeting at least one of the conditions of 

syllable proximity frame number spch_prox[k] is obtained ^ ^ to C4 7 

on the below-described expression (24) and then is output. 1 ' ' . . . . Cl4 . . . . 

Further, the initial filter response calculating unit 33 

{0 (RMS > 1250) (zc < 70) (24) operates to feed the noise time mean value N[w, k] output 

where 1-*- 4,.' . * 20 from the noise s V* cX ™ m estimating unit 26 and Y[w, k] 

spch-prox[k - i] otherwise output from the band dividing unit 4 to the filter suppressing 

_ ( , , curve table 34, find out a value of H[w, k] corresponding to 

Based on (he number of the zero crossings, the number of ^ k] and ^ k] stored to the fiUer suppressing curve 

he speech proximity frames thetone and the RMS value, (able 34 amJ , he H[ k] ^ Hn yalue calculati 

the syllable components in the Y[w, k] of each frame are , - rL cn • , ui n , , 

detected. As a result of detecting the syllables, CE[k] is 25 U ° U ™ e ^^ssmg curve table 34 stores the table 

obtained on the below-described expression (25). ^ V* 1 , , . . „ . , 

The Hn value calculating unit 7 is a pre-filter for reducing 

r^i ft rt/^^-, j^,- ^ ns\ tne "o^ components of the amplitude Y[w, k] of the 

E (tone[#l > 0.6) (Cl,C2,and C3 is true) («) <• . . . , , . , . • , 1 • . . 

spectrum of the input signal that is divided into the bands, 
and(C4.i ) C4.2,... ) orC4.7istme) 30 tne ^ me mean estimated value N[w, k] of the noise 

max{0,CE[k - 1] - 0.05} otherwise spectrum, and the NR[w, k]. In the pre-filter, the Y[w, k] is 

Each of the symbols CI, C2, C3, C4.1 to C4.7 is defined converted into the Hn[w, k] according to the N[w, k]. Then, 
on the following table. me pre-filter outputs the filter response Hn[w, k]. The Hn[w, 

k] value is calculated on the below-described expression 
35 (26). 



CE[k]< 



TABLE 2 



Symbol 


Definition 


CI 


RMS[k] > CDSO ■ MinRMSfK] 


C2 


ZC[K] > Z low 


C3 


spch prox[k] < T 


C4.1 


RMSfk] > CDS1 • RMS[K-1] 


C4.2 


RMS[k] > CDS1 ■ RMS[k-2] 


C4.3 


RMS[k] > CDS1 ■ RMS[k-3] 


C4.4 


ZC[k] > Z high 


C4.5 


tone[k] > CDS2 • tone[k-l] 


C4.6 


tone[k] > CDS2 • tone[k-2] 


C4.7 


tone[k] > CDS2 • tone[k-3] 



Hnlw^HxplNRtw^lnCHlwlS/N^I)} ' (26) 

^log^Hlw^D-NRtw^K (27) 

40 where K is constant. 

The value H[w] [S/N=r] in the expression (26) corre- 
sponds to the most appropriate noise suppression filter 
characteristic given when the SN ratio is fixed to a certain 
value r. This value is tabulated according to the value of 
45 Y[w, k]/N[w, k] and is stored in the filter suppressing curve 
table 34. The H[w] [S/N =r] is a value changing linearly in 
the dB domain. 

In the table 2, each value of CDSO, CDS1, CDS2, T, Zlow The transformation of the expression (26) into the expres- 
and Zhigh is a constant for defining a sensitivity at which the sion (27) results in indicating that the left side of the function 
syllable is detected. For example, these values are such that 50 about the maximum suppression rate has a linear relation 
CDS0-CDS1-CDS2-1.41, T=20, Zlow-20, and Zhigh-75. ^ nr[ w> k]. The relation between the function and the 
E in the expression (25) takes a value from 0 to 1. The filter nr[ Wj k ] can ^ indicated as shown in FIG. 10. 
response (to be described below) is adjusted so that the ^ fi ^ ^ g ^ tQ form a m ^ 
syllable suppression rate is made to close to the normal rate for smoothing the Hn [w, k] value in the directions of the 
asthevalueofEisc 55 fr ^ ^ ^ {imc axis and out , lhe smoolned 

rate is made to closer to the minimum rate as the value of E • m r n ci. ■ 1. e 

is closer to 1. As an example, the E takes a value of 0.7. sl S nal H^^w. k]. The filtering process on the frequency 

In the table 2, at a certain frame, If the symbol CI is held, » M » tlv " ? rC f * a g. the f e ° UVC im £!f respOQSe 
it indicates that the signal level of the frame is larger than the kn # h of the f H?K k ± ^ makes u P 0SSlble t0 P rcvcnt 
minimum noise level. If the symbol C2 is held, it indicates occurrence of abasing caused by circular convolution result- 
that the number of the zero crossings is larger than the 60 m 5 from the multiplication-based filter m the frequency 
predetermined number Zlow of the zero crossings, in this domain. The filtering process on the time axis is effective in 
embodiment, 20. If the symbol C3 is held, it indicates that limiting the changing speed of the filter for suppressing 
the current frame is located within T frames from the frame unexpected noise. 

at which the voiced speed is detected, in this embodiment, At first, the filtering process on the frequency axis will be 

within 20 frames. 65 described. The median filtering process is carried out about 

If the symbol C4.1 is held, it indicates the signal level is the Hn[w, k] of each band. The following expressions (28) 

changed in the current frame. If the symbol C4.2 is held, it and (29) indicate this method. 
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stepl:Hl[w^}-max{median(Hn[w-l^J,Hn[w^], Hjw+l,k],Hn[w, 
k]} (2S) 

where Hl[wJc]^Ha[wJc] in case (w-1) or (w+1) is absent. 

stcp2:H2[ W> k]-miii{mediaii(Hl(W-l,KlHl[w^l Hl[w+1, kl 5 
Hl[w,k]} (29) 

where H2[wjc]=Hl[wJc] in case (w-1) or (w+1) is absent. 

At the first step (Step 1) of the expression (28), Hl[w, k] 
is an Hn[w, k] with no unique or isolated band of 0. At the 1Q 
second step (step 2) of the expression (29), H2[w, k] is a 
Hl[w, k] with no unique or isolated band. Along this 
relation, the Hn[w, k] is converted into the H2[w, k]. 

Next, the filtering process on the time axis will be 
described. In doing the filtering process on the time axis, it 
is necessary to consider that the input signal has three kinds 
of states, that is, a speech, a background noise, and a 
transient state of the leading edge of the speech. For the 
speech signal H speech [w, k], as shown in the expression (30), 
the smoothing on the time axis is carried out. 



20 



H ipeB >k]=0.7-H2[w,k]+O.3'H2[w ) k-l] 
H„^iwXh0.7-Mm_H-f0.3-Max„H 



(30) 
(31) 



25 



where 

Min_H=min(H2[w^],H2[wJc-l]) 

Max_H«max(H2[w,k],H2[w,k-l]) 

For the background noise signal, the smoothing on the 
time axis as shown in the following expression (31) is 
carried out. 30 

For the transient state signal, the smoothing on the time 
axis is not carried out. 

With the foregoing smoothed signal, the calculation of the 
expression (32) results in obtaining the smoothed output 
signal H^^wjc], 35 

H^mu^M = (32) 
(1 - On) * {0, p ' + (1 - • JW»(*]} + On * Hl[w,k] 



where 



SNRin 



where 



1.0 SNR iM > 4.0 

{SNRfM - 1) • -j 1 .0 < SNRinx < 4.0 
0 otherwise 



RMS[k] 
" M inRMS[k] 

1.0 6„ w >3.5 

(6^,-2)-^- 2.0<6 nM <3.5 
0 otherwise 



RMSi^ik - 1] 



1 

Ff 



FL-FIfl 
■ 2 (y- 

hFin 



fmme2j,k) 2 



(33) 40 



45 



(34) 

50 



55 



60 



Herein, in the expression (32) can be derived from the 
following expression (33) and can be derived from the 
following expression (34). 

In succession, the band converting unit 9 operates to 65 
expand the smoothed signal ^^^[^ k] of e,g., 18 bands 
from the filtering unit 8 into a signal H 128 [w, k] of e.g., 128 



bands through the effect of the interpolation. Then, the band 
converting unit 9 outputs the resulting signal H 32fi [w, k]. 
This conversion is carried out at two stages, for example. 
The expansion from 18 bands to 64 bands is carried out by 
a zero degree holding process. The next expansion from 64 
bands to 128 bands is carried out through a low-pass filter 
type interpolation. 

Next, the spectrum correcting unit 10 operates to multiply 
the signal H 128 [w, k] by a real part and an imaginary part of 
the FFT coefficient obtained by performing the FFT with 
respect to the framed signal y-frame^ from the fast Fourier 
transforming unit 3, for modifying the spectrum, that is, 
reducing the noise components. Then, the spectrum correct- 
ing unit 10 outputs the resulting signal. Hence, the spectral 
amplitude is corrected without transformation of the phase. 

Next, the reverse fast Fourier transforming unit 11 oper- 
ates to perform the inverse FFT with respect to the signal 
obtained in the spectrum correcting unit 10 and then output 
the resulting IFFT signal. Then, an overlap adding unit 12 
operates to overlap the frame border of the IFFT signal of 
one frame with that of another frame and output the resulting 
output speech signal at the output terminal 14 for the speech 
signal. 

Further, consider the case that this output is applied to an 
algorithm for linearly predicting coding excitation, for 
example. The conventional algorithm-based encoding appa- 
ratus is illustrated in FIG. 11. The conventional algorithm- 
based decoding apparatus is illustrated in FIG. 12. 

As shown in FIG. 11, the encoding apparatus is arranged 
so that the input speech signal is applied from an input 
terminal 61 to a linear predictive coding (LPC) analysis unit 
62 and a subtracter 64. 

The LPC analysis unit 62 performs a linear prediction 
about the input speech signal and outputs the predictive filter 
coefficient to a synthesizing filter 63. Two code books, a 
fixed code book 67 and a dynamic code book 68, are 
provided. A code word from the fixed code book 67 is 
multiplied by a gain of a multiplier 82. Another code word 
from the dynamic code book 68 is multiplied by a gain of the 
multiplier 81. Both of the multiplied results are sent to an 
adder 69 in which both are added to each other. The added 
result is input to the LPC synthesis filter having a predictive 
filter coefficient. The LPC synthesis filter outputs the syn- 
thesized result to a subtracter 64. 

The subtracter 64 operates to make a difference between 
the input speech signal and the synthesized result from the 
synthesizing filter 63 and then output it to an acoustical 
weighting filter 65. The filter 65 operates to weight the 
difference signal according to the spectrum of the input 
speech signal in each frequency band and then output the 
weighted signal to an error detecting unit 66. The error 
detecting unit 66 operates to calculate an energy of the 
weighted error output from the filter 65 so as to derive a code 
word for each of the code books so that the weighted error 
energy is made minimum in the search for the code books of 
the fixed code book 67 and the dynamic code book 68. 

The encoding apparatus operates to transmit to the decod- 
ing apparatus an index of the code word of the fixed code 
book 67, an index of the code word of the dynamic code 
book 68 and an index of each gain for each of the multi- 
pliers. The LPC analysis unit 62 operates to transmit a 
quantizing index of each of the parameters on which the 
filter coefficient is generated. The decoding apparatus oper- 
ates to perform a decoding process with each of these 
indexes. 

As shown in FIG. 12, the decoding apparatus also 
includes a fixed code book 71 and a dynamic code book 72. 
The fixed code book 71 operates to take out the code word 
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based on the index of the code word of the fixed code book 
67. The dynamic code word 72 operates to take out the code 
word based on the index of the code word of the dynamic 
code word. Further, there are provided two multipliers 83 
and 84, which are operated on the corresponding gain index. 
A numeral 74 denotes a synthesizing filter that receives 
some parameters such as the quantizing index from the 
encoding apparatus. The synthesizing filter 74 operates to 
synthesize the multiplied result of the code word from the 
two code books and the gain with an excitation signal and 
then output the synthesized signal to a post-filter 75. The 
post-filter 75 performs the so-called formant emphasis so 
thai the valleys and the mountains of the signal are made 
more clear. The formant-emphasized speech signal is output 
from the output terminal 76. 

In order to gain a more preferable speech signal in light 
of the acoustic sense, the algorithm contains a filtering 
process of suppressing the low-pass side of the encoded 
speech signal or booting the high -pass side thereof. The 
decoding apparatus feeds a decoded speech signal whose 
low-pass side is suppressed. 

With the method for reducing the noise of the speech 
signal, as described above, the value of the adj3[w, k] of the 
adj value calculating unit 32 is estimated to have a prede- 
termined value on the low-pass side of the speech signal 
having a large pitch and a linear relation with the frequency 
on the high-pass side of the speech signal. Hence, the 
suppression of the low-pass side of the speech signal is held 
down. This results in avoiding excessive suppression on the 
low-pass side of the speech signal formant-emphasized by 
the algorithm. It means that the encoding process may 
reduce the essential change of the frequency characteristic. 

In the foregoing description, the noise reducing apparatus 
has been arranged to output the speech signal to the speech 
encoding apparatus that performs a filtering process of 
suppressing the low-pass side of the speech signal and 
boosting the high-pass side thereof. In place, by setting the 
adj3[w, k] so that the suppression of the high-pass side of the 
speech signal is held down when suppressing the noise, the 
noise reducing apparatus may be arranged to output the 
speech signal to the speech encoding apparatus that operates 
to suppress the high-pass side of the speech signal, for 
example. 

The CE and NR value calculating unit 36 operates to 
change the method for calculating the CE value according to 
the pitch strength and define the NR value on the CE value 
calculated by the method. Hence, the NR value can be 
calculated according to the pitch strength, so that the noise 
suppression is made possible by using the NR value calcu- 
lated according to the input speech signal. This results in 
reducing the spectrum quantizing error. 

The Hn value calculating unit 7 operates to substantially 
linearly change the Hn[w, k] with respect to the NR[w, k] in 
the dB domain so that the contribution of the NR value to the 
change of the Hn value may be constantly serial. Hence, the 
change of the Hn value may comply with the abrupt change 
of the NR value. 

To calculate the maximum pitch strength in the signal 
characteristic calculating unit 31, it is not necessary to 
perform a complicated operation of the autocorrelation 
function such as (N+logN) used in the FFT process. For 
example, in the case of processing 200 samples, the fore- 
going autocorrelation function needs 50000 processes, while 
the autocorrelation function according to the present inven- 
tion just needs 3000 processes. This can enhance the oper- 
ating speed. 

As shown in FIG. 2A, the first framing unit 22 operates to 
sample the speech signal so that the frame length FL 
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corresponds to 168 samples and the current frame is over- 
lapped with the one previous frame by eight samples. As 
shown in FIG. 2B, the second framing unit 1 operates to 
sample the speech signal so that the frame length FL 

5 corresponds to 200 samples and the current frame is over- 
lapped with the one previous frame by 40 samples and with 
the one subsequent frame by 8 samples. The first and the 
second framing units 22 and 1 are adjusted to set the starting 
position of each frame to the same line, and the second 
framing unit 1 performs the sampling operation 32 samples 
later than the first framing unit 22. As a result, no delay takes 
place between the first and the second framing units 22 and 
1, so that more samples may be taken for calculating a signal 
characteristic value. 
The RMS[k], the Min RMS[k], the tone[w, k], the ZC[w, 

15 k] and the Rxx are used as inputs to a back-propagation type 
neural network for estimating noise interval, as shown in 
FIG. 13. 

In the neural network, the RMS[k], the Min RMS[k], the 
tone[w, k], the ZC[w, k] and the Rxx are applied to each 
20 terminal of the input layer. 

The values applied to each terminal of the input layer is 
output to the medium layer, when a synapse weight is added 
to the values. 

The medium layer receives the weighted values and the 

25 bias values from a bias 51. After the predetermined process 
is carried out for the values, the medium layer outputs the 
processed result. The result is weighted. 

The output layer receives the weighted result from the 
medium layer and the bias values from a bias 52. After the 

30 predetermined process is carried out for the values, the 
output layer outputs the estimated noise intervals. 

The bias values output from the biases 51 and 52 and the 
weights added to the outputs are adaptively determined for 
realizing the so-called preferable transformation. Hence, as 

35 more data is processed the probability is increased. That is, 
as the process is repeated more, the estimated noise level and 
spectrum are closer to the input speech signal in the classi- 
fication of the speech and the noise. This makes it possible 
to calculate a precise Hn value. 

40 What is claimed is: 

1. A method for reducing noise in an input speech signal 
by supplying the input speech signal to a speech encoding 
apparatus having a filter for suppressing a predetermined 
frequency band of the input speech signal, comprising the 

45 steps of; 

controlling a frequency characteristic of the filter to 
reduce a noise suppression rate in the predetermined 
frequency band; and 

changing the noise suppression rate of the filter according 
50 to a pitch strength of the input speech signal. 

2. The noise reduction method as claimed in claim 1, 
wherein the noise suppression rate is changed so that the 
noise suppression rate on a high-pass side of the input 
speech signal is de-emphasized. 

55 3. The noise reduction method as claimed in claim 1, 
wherein the predetermined frequency band is located on a 
low-pass side of the input speech signal and the noise 
suppression rate of the filter is changed so that the noise 
suppression rate on the low-pass side of the input speech 
60 signal is de-emphasized. 

4. A method for reducing noise in an input speech signal 
by supplying the input speech signal to a speech encoding 
apparatus having a filter for suppressing a predetermined 
frequency band of a plurality of frequency bands of the input 
65 speech signal, comprising the step of: 

changing a noise suppression characteristic of the filter 
based on a ratio of a signal level to a noise level in each 
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of the plurality of frequency bands while suppressing 
the noise in the predetermined frequency band accord- 
ing to a pitch strength of the input speech signal, 
wherein the noise suppression characteristic is changed 
so that a noise suppression rate is inversely propor- 
tional to the pitch strength. 

5. A method for reducing noise in an input speech signal 
by supplying the input speech signal to a speech encoding 
apparatus having a filter for suppressing a predetermined 
frequency band of the input speech signal, comprising the 
steps of: 

inputting parameters for determining a noise suppression 
characteristic to a neural network, the parameters 
including root mean square values, an estimated noise 
level of the input speech signal, and a pitch strength of 
the input speech signal; and 

distinguishing a noise interval of the input speech signal 
from a speech interval of the input speech signal. 

6. A method for reducing noise in an input speech signal 
by supplying the input speech signal to a speech encoding 
apparatus having a filter for suppressing a predetermined 
frequency band of the input speech signal, comprising the 
steps of: 

suppressing the noise in said predetermined frequency 
band according to a pitch strength of the input speech; 
and 

linearly changing a maximum suppression ratio of a noise 
suppression characteristic in a dB domain. 
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7. A method for reducing noise in an input speech signal 
by supplying the input speech signal to a speech encoding 
apparatus having a filter for suppressing a predetermined 
frequency band of the input speech signal, comprising the 

5 steps of: 

deriving a pitch strength of the input speech signal by 
calculating an autocorrelation value close to a pitch 
location obtained by selecting a peak of a signal level; 

:o and 

controlling the noise suppression characteristic based on 
the pitch strength. 

8. A method for reducing noise in an input speech signal 
by supplying the input speech signal to a speech encoding 

is apparatus having a filter for suppressing a predetermined 
frequency band of the input speech signal, comprising the 
step of: 

performing a framing process of the input speech signal 
by independently using a frame for calculating param- 
20 eters indicating a feature of the input speech signal and 
using a frame for correcting a spectrum with the 
calculated parameters, wherein 
the frame for calculating parameters partially overlaps a 
25 previous frame for calculating parameters, and 

the frame for correcting a spectrum partially overlaps a 
previous frame for correcting a spectrum. 

***** 
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